Normal view MARC view ISBD view

Topic Detection in Tweets using Hashtags

By: Manzoor, Shazia.
Contributor(s): Siddiqi,Taha Hafeez.
Publisher: New Delhi STM Journals 2018Edition: Vol,5(2), May- Aug.Description: 6-14p.Subject(s): Computer EngineeringOnline resources: Click Here In: Recent trends in programming languagesSummary: Data clustering is a common technique used for topic detection and identification in text. In this paper we propose tokenizing of hashtags as an effective and efficient way of generating bags of words for tweets (text messages on Twitter). The document- term matrix generated using these terms is smaller and more relevant compared to the full text tokenization. Tweets from the 2018 Oscars event tagged with #oscars hashtag were collected and split into sets of 50000 tweets. Three strategies for tokenization of text and hashtags were tested using two popular data clustering algorithms, K-Means++ and Non-negative Matrix Factorization (NMF). The results were compared in terms of accuracy and performance. We concluded that tokenizing hashtags is a better alternative in terms of performance and achieves results equivalent to that of tokenizing tweet text.
Tags from this library: No tags from this library for this title. Log in to add tags.
    average rating: 0.0 (0 votes)
Item type Current location Call number Status Date due Barcode Item holds
Articles Abstract Database Articles Abstract Database School of Engineering & Technology
Archieval Section
Not for loan 2021-2021771
Total holds: 0

Data clustering is a common technique used for topic detection and identification in text. In this paper we propose tokenizing of hashtags as an effective and efficient way of generating bags of words for tweets (text messages on Twitter). The document- term matrix generated using these terms is smaller and more relevant compared to the full text tokenization. Tweets from the 2018 Oscars event tagged with #oscars hashtag were collected and split into sets of 50000 tweets. Three strategies for tokenization of text and hashtags were tested using two popular data clustering algorithms, K-Means++ and Non-negative Matrix Factorization (NMF). The results were compared in terms of accuracy and performance. We concluded that tokenizing hashtags is a better alternative in terms of performance and achieves results equivalent to that of tokenizing tweet text.

There are no comments for this item.

Log in to your account to post a comment.

Click on an image to view it in the image viewer

Unique Visitors hit counter Total Page Views free counter
Implemented and Maintained by AIKTC-KRRC (Central Library).
For any Suggestions/Query Contact to library or Email: librarian@aiktc.ac.in | Ph:+91 22 27481247
Website/OPAC best viewed in Mozilla Browser in 1366X768 Resolution.

Powered by Koha